Kenneth Tay
Oct 22, 2019
readr
maps
packagemaps
package contains a lot of outlines of continents, countries, states, and countiesggplot2
’s map_data()
function puts these outlines in data frame format, which then allows us to plot them with ggplot()
county_data <- map_data("county")
CA_data <- county_data %>% filter(region == "california")
head(CA_data)
## long lat group order region subregion
## 1 -121.4785 37.48290 157 6965 california alameda
## 2 -121.5129 37.48290 157 6966 california alameda
## 3 -121.8853 37.48290 157 6967 california alameda
## 4 -121.8968 37.46571 157 6968 california alameda
## 5 -121.9254 37.45998 157 6969 california alameda
## 6 -121.9483 37.47717 157 6970 california alameda
County outlines are drawn using geom_polygon
.
coord_quickmap()
preserves the aspect ratio of the map.
## # A tibble: 3 x 2
## County Drought_percent
## <chr> <dbl>
## 1 alameda 100
## 2 alpine 100
## 3 amador 100
## long lat group order region subregion
## 1 -121.4785 37.4829 157 6965 california alameda
## 2 -121.5129 37.4829 157 6966 california alameda
## 3 -121.8853 37.4829 157 6967 california alameda
Our drought data and mapping information are in different datasets!
Sometimes our data are spread across different datasets, making it difficult to answer some questions.
Question: Who scored the highest in English in each class?
## Name Class
## 1 Andrew A
## 2 John B
## 3 Mary A
## 4 Jane B
## Name Subject Score
## 1 John English 76
## 2 Andrew English 66
## 3 John Math 85
## 4 Mary English 71
dplyr
)dplyr
)## Name Class Subject Score
## 1 Andrew A English 66
## 2 John B English 76
## 3 John B Math 85
## 4 Mary A English 71
## 5 Jane B <NA> NA
dplyr
)Question: Who scored the highest in English in each class?
library(dplyr)
bio %>% left_join(scores, by = "Name") %>%
filter(Subject == "English") %>%
group_by(Class) %>%
top_n(1, Score)
## # A tibble: 2 x 4
## # Groups: Class [2]
## Name Class Subject Score
## <chr> <chr> <chr> <dbl>
## 1 John B English 76
## 2 Mary A English 71
## # A tibble: 3 x 2
## County Drought_percent
## <chr> <dbl>
## 1 alameda 100
## 2 alpine 100
## 3 amador 100
## long lat group order region subregion
## 1 -121.4785 37.4829 157 6965 california alameda
## 2 -121.5129 37.4829 157 6966 california alameda
## 3 -121.8853 37.4829 157 6967 california alameda
Our drought data and mapping information are in different datasets!
Solution: Join the datasets together.
combined_data <- CA_data %>%
left_join(drought_data, by = c("subregion" = "County"))
head(combined_data)
## long lat group order region subregion Drought_percent
## 1 -121.4785 37.48290 157 6965 california alameda 100
## 2 -121.5129 37.48290 157 6966 california alameda 100
## 3 -121.8853 37.48290 157 6967 california alameda 100
## 4 -121.8968 37.46571 157 6968 california alameda 100
## 5 -121.9254 37.45998 157 6969 california alameda 100
## 6 -121.9483 37.47717 157 6970 california alameda 100
Map the fill
attribute of geom_polygon
to the Drought_percent
column.
Use scale_fill_distiller
to define a more appropriate color scale.
Optional material
Inner join: Matches pairs of observations with equal keys, drops everything else. Hence, only keeps observations which appear in both datasets.
After matching pairs of observations with equal keys…